The Elephant and the Mouse: Non-Strict Fine-Grain Synchronization for Many-Core Architectures
نویسندگان
چکیده
A new synchronization mechanism created under the dataflow model of computation was introduced during the late 1970s and called I-Structure. I-Structure exhibited the following important features: (1) it is a dataflow style synchronization, i.e., synchronization only occurs between an I-Structure producer and consumer operations that are accessing the same memory location; (2) it is fine-grain i.e., it synchronizes at a finer memory granularity than only at the whole data structure level (for instance, it would synchronize at each individual array element instead of barrier synchronization which synchronizes at the data structure level.); (3) it is a lenient (non-strict) synchronization i.e., an I-Structure load can be issued (non-blocking) even before the corresponding I-Structure store is issued/completed. This paper reports a study of I-Structures in the context of modern many-core chip architectures. The major points examined include: • The creation of an I-Structure style design that exploits a lenient synchronization model using a modern many-core architecture the IBM Cyclops-64 architecture. • The implementation and integration of our design in the DEEP emulation system that can simulate the entire Cyclops-64 chip at gate level. This allows us to assess the feasibility of its hardware design and implementation. • The demonstration of the advantages of I-Structure style synchronization especially its lenient synchronization feature on the Cyclops-64 architecture through an experimental case study using wavefront computation. A quantitative comparison to traditional control-flow based synchronization, such as signal-wait, is reported.
منابع مشابه
Efficient Synchronization for a Large-scale Multi-core Chip Architecture
Multi-core architectures are becoming mainstream, permitting increasing on-chip parallelism through hardware support for multithreading. Synchronization, especial finegrain synchronization, is essential to the effective utilization of the computational power of high-performance large-scale multi-core architectures. However, designing and implementing fine-grain synchronization in such architect...
متن کاملEfficient Fine-Grain Synchronization on a Multi-Core Chip Architecture: A Fresh Look
Multi-core chip architectures are becoming mainstream, permitting increasing on-chip parallelism through hardware support for multithreading. Fine-grain synchronization is essential to the effective utilization of the capacity provided by future high-performance multi-core architectures. However, there are also new challenges realizing such fine-grain synchronization in large-scale multi-core c...
متن کاملAn Efficient Synchronisation Mechanism for Multi-Core Systems
The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single-Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Cons...
متن کاملUltra-Low-Energy DSP Processor Design for Many-Core Parallel Applications
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...
متن کاملA Study of Parallel Betweenness Centrality Algorithm on a Manycore Architecture
Large scale graph analysis algorithms–such as those in SCCA2 benchmarks studied in this paper–play an increasingly important role in high performance computing applications. Different from most of traditional scientific computing applications, graph algorithms often show dynamic and irregular computing behavior. It is difficult to attain good performance on large scale conventional parallel arc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010